HES 505 Fall 2024: Session 22
By the end of today you should be able to:
Articulate the differences between statistical learning classifiers and logistic regression
Describe several classification trees and their relationship to Random Forests
Describe MaxEnt models for presence-only data
\[ \begin{equation} F(\mathbf{s}) = f(w_1X_1(\mathbf{s}), w_2X_2(\mathbf{s}), w_3X_3(\mathbf{s}), ..., w_mX_m(\mathbf{s})) \end{equation} \]
Logistic regression treats \(f(x)\) as a (generalized) linear function
Allows for multiple qualitative classes
Ensures that estimates of \(F(\mathbf{s})\) are [0,1]
Dependent variable must be binary
Observations must be independent (important for spatial analyses)
Predictors should not be collinear
Predictors should be linearly related to the log-odds
Sample Size
Logistic (and other generalized linear models) are relatively interpretable
Probability theory allows robust inference of effects
Predictive power can be low
Relaxing the linearity assumption can help
Use decision rules to segment the predictor space
Series of consecutive decision rules form a ‘tree’
Terminal nodes (leaves) are the outcome; internal nodes (branches) the splits
Divide the predictor space (\(R\)) into \(J\) non-overlapping regions
Every observation in \(R_j\) gets the same prediction
Recursive binary splitting
Pruning and over-fitting
Inputs from the dismo package
The sample data
Building our dataframe
Building our dataframe
Benefits
Easy to explain
Links to human decision-making
Graphical displays
Easy handling of qualitative predictors
Costs
Lower predictive accuracy than other methods
Not necessarily robust
Grow 100(000s) of trees using bootstrapping
Random sample of predictors considered at each split
Avoids correlation amongst multiple predictions
Average of trees improves overall outcome (usually)
Lots of extensions
Opportunistic collection of presences only
Hypothesized predictors of occurrence are measured (or extracted) at each presence
Background points (or pseudoabsences) generated for comparison
What constitutes background?
Not measuring probability, but relative likelihood of occurrence
Sampling bias affects estimation
The intercept
\[ \begin{equation} y_{i} \sim \text{Bern}(p_i)\\ \text{link}(p_i) = \mathbf{x_i}'\beta + \alpha \end{equation} \]
Opportunistic collection of presences only
Hypothesized predictors of occurrence are measured (or extracted) at each presence
Background points (or pseudoabsences) generated for comparison
MaxEnt (after the original software)
Need plausible background points across the remainder of the study area
Iterative fitting to maximize the distance between predictions generated by a spatially uniform model
Tuning parameters to account for differences in sampling effort, placement of background points, etc
Development of the model beyond the scope of this course, but see Elith et al. 2010
Not measuring probability, but relative likelihood of occurrence
Sampling bias affects estimation (but can be mitigated using tuning parameters)
Theoretical issues with background points and the intercept
Recent developments relate MaxEnt (with cloglog links) to Inhomogenous Point Process models
Polynomial, splines, piece-wise regression
Neural nets, Support Vector Machines, many many more